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IS 



abandoned, which ^ ^^^l^'^^J^onoL which is a 
08^81.015. filed Jan. 30 1W5 now ^ ^ar. 
continuation of application Ser. no. uho 
6. 1992 now abandoned. 

BACKGROUND OF THE INVEKTION 
1. Field of tt>e Invention •^„„fd,ta 

mation among master ^'^^^''^^^^ fransmission lines 

Generally, a bus composes a Pj^^^^n^oL and data 
to which thedevices are coupl^AdJess^^U^^ 

f'^^Z Z ^'£1^^ is^coLunicated across 
forming the bus. '"Jf^ o_. judi format is a 3C 
the bus in many «^«"^^^;„°Jcd S packets for 
packet format "'J^* ^^^Sple clock%clcs JVn 
transmission on the bus ^^^'^ ^-scribed in PCT 

of a bus which utilizes i^'^i;o2590 
Stional patent WUfa^^-ritTlS a^d^^^d 3. 

reduce the latency '^'t^^llZS^'^^' 
f^Sonability a. the bus interface. 

SUMMARY AND OBJECTS OF THE 
INVENTION 

„^g devices of^eocc^n^oj^^^^^^^ ^ 

B is further an object '^^^"^.f^u^^ speed bus in 
a packet format for transmission across ^^^^^^^ 
which the block size decoding at the receiving ocvi 



simplified Acrcby increasing U.e speed a. which the receiv- 
ing device processes the information. 

Tt is an object of the present invention to provide a packet 

request packet. 

BRIEF DESCRIPTION OF THE DRAWINGS 
^ mobjectsfcaturcsanda^^u^^me^pre^^^^^ 
° tion will become apparent to one sloUeo 
?ca^g the following detaUed descnpUon m which. 
HO 1 illustrates a prior art packet fonnat utdized in a 

5 ''?ir^ IsTblock diagram Illustration of an iUustrativc 
hieh speed bus structure. 

HG^ iUustrates a prefened embodiment of the packet 
format of the present inventoon. 

FIG 4 illustiates another embodiment of the packrt 
" fo^t of *e present invention in which active collision 
detection of packets is performed. . 

L anVnG. 5b iUustmte the decrease in length of fte 
caS^^Sn bJSmi^'tion of the information in the packet 

' ^°TOS.6.and«,illustratetheinnovativeencodingofbits 

for eenaation of byte masks utiUzed. 

F?GS 7« 7b 7/and7rfillustratetheinnovaUveencodmg 
tcchdque clnSloyed for byte transfer of varying lengths. 
0 DETAILED DESCRIPTION 

The reaucst packet format is designed for use on a high 

devices such as 1*°^*^.^^^ ^jom access mem<^ 
SSinforUon needed by the master devices for 
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conununication with the slave devices ccaiplcd to the bus. 
The bus architecture includes the following signal transmis- 
sion lines: BusCtl. BusData [8:0], as well as clock signal 
lines and power and ground lines. These lines are connected 
in parallel to each device as illustrated in FIG. 2. 

The processon conununicate with the DRAMs to read 
and write data to the memory. The processors form request 
packets which arc communicated to the DRAMs by trans- 
mitting the bits on predetermined transmission lines at a 
predetermined time sequence (i.e., at predetermined clock 
cycles). The bus interface of the DRAM receiver processes 
the information received to determine the type of memory 
request, the address of the memory request and the number 
of bytes of the transaction. The DRAMs then perfonn the 
memory operation indicated by the request packet 

The memoiy address consists of the row address which is 
used during the row address strobe (RAS) in the DRAM and 
the column address which is used during the column address 
strobe (CAS) in the DRAM. The DRAMs have the capa- 
bility to operate in normal RAS access mode ot in page 
mode. When operable in page nKxle. if a subsequent request 
to access data is directed to the same row, the DRAM does 
not need to wait for receipt erf the row address and to assert 
RAS, as RAS has been asserted during the previous memory 
access. Thus, the access time for this data is shortened. For 
further discussion regarding page naode DRAMs. see Steve 
L. Gimmi, Carl T. Dreher. Unraveling the Intricacies of 
Dynamic RAM. Hectronic Design News. pp. 155-165 (Mar. 
30. 19S9). 

The request packet f<Mmat further helps to improve the 
perfomaance of the DRAMs in response to memoiy requests 
for page mode access. The DRAMs use the lower order 
portion of the memory address as the column address bits. 
This provides a locality of reference such that bytes of 
memory which are logically contiguous will be physically 
contiguous in the memory space. The resultant effect is that 
a greater number of logically contiguous bytes of memory 
are also physically contiguous and the frequency of page 
mode accesses is increased. 

To further increase the access speed for a memory request 
the lower order bits arc placed at the beginning of the packet 
This is illustrated in FIG. 3. who-e address bits Address [9:2] 
are placed in the first word of the packet and bits Address 
[17:8] arc placed in the second word of the packet. By 
placing the lower wdcr biU at the beginning of the packet 
those memory accesses performed in page mode can be 
processed at least two cycles earlier further increasing the 
performance of the memory accesses. 

Asthelowerorder bits of the memory address are placed 
in the first two words of the packet little room is left at the 
beginning of the packet for op code bits. op[3:0], which 
identify the type of mcmcHy q>cration to be performed (e.g., 
page mode access). However, as the memory q>eration type 
needs to be determined in order to perfonn the memory . 
operation, the op code bits need to be transmitted eaiiy in the 
packet In the packet fonnat of the present invention, the 
BusC:tl line and the most significant bit of the Data signal 
line, BusData[8]. are utilized to transmit the op code bits. 
The bits are transmitted within the first 4 words of the ^ 
packet coincident with the transmission of the memory 
address. Preferably the memory operation types are coded in 
such a manner that the bits transmitted coincident with the 
lower order bits of the memoiy address indicate whether a 
page mode memoiy operation is to be poformed. ^ 

At bus cycle zero, the BusCd line is used to indicate the 
start of the packet 
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When multiple devices arc transmitting on a bus, the 
possibility of packet collisions exists. Many different tech- 
niques arc employed to avoid the concurrent transmission of 
multiple packets on a bus. For example, the master devices 
5 keq) track of ail pending transactions, so that each master 
device knows when it can send a request packet and access 
the COTresponding response. However, the master devices 
will occasionally transmit independent request packets dur- 
ing the same bus cycle. Those multiple requests will collide 
as each master device drives the bus simultaneously with 
different infcHination. resulting in scrambled request infcH-- 
mation. Prior art techniques for detecting and responding to 
collision detection generally have been found to be too slow 
for high speed buses. Thus a mechanism for the detection of 

J5 packet collisions on high speed buses is needed. 

Typically two types of collisions will occur: those which 
are con^lctely aligned in which two or more master devices 
start transmission at exactly the same cycle, and those which 
arc unaligned in which two or more master devices start 

20 transmission at different cycles which are close enough 
together to cause ovcriap of the request packets. In PCT 
international patent application number PCrAJS9I/02590 
filed Apr. 16, 1991, published Oct. 31, 1991, and entitied 
Integrated Circuit I/O Using a High Performance Bus 

25 Interface, collisions were detected by the master devices and 
signals indicating the existence of the collision were subse- 
quently sent by the master devices to the slave devices. This 
technique requires the master devices to process the detec- 
tion of a collision and drive the bus to notify the slave 

0 devices in a very short period of time. To eliminate need for 
the master device to notify the slave device of the collision, 
the master devices and tiic slave devices detect and process 
the existence of a collision in parallel 

Additional bits of the packet are prcallocatcd to store a 
5 code whidi identifies the master device transmitting the 
packet This is illustrated in the packet format shown in FIG. 
4. At bus cycles 4 and 5. the processor device code, 
MasterI3;0] is transmitted. If two master devices issue 
packet requests starting at the same bus cycle, the master 

1 device code, Master[3:01. will be logically ORed together 
resulting in a different code. This is detected in parallel by 
the master devices and slave devices which are monitOTing 
tiie bus signal lines. The slave devices immediately respond 
by discarding tiic packets received and an arbitration is 
performed to determine priority of master device access to 
the bus for retransnoission of the request packets. 

An unaligned collision condition arises when a first 
master device issues a request packet at cycle 0 and a second 
master device issues a later packet starting, for example, at 
cycle 2 of the first request packet, thereby overlapping the 
first request packet. This will occur as the bus operates at 
high speeds, and the logic in the second master device jnay 
not be fast enough to detect a request initiated by the first 
master at cycle 0 and delay its own request. As the collision 
occurs during the later clock cydes of the first packet it is 
critical that the slave device receiving the request know of 
the collision before completion of transmission of the 
request packet so that the packet can be discarded before the 
slave device responds to the request The high speed of the 
bus increases the difficulty of the master device timely 
notifying die slave device of the occurrence of a collision. 
Therefore a second innovative collision detection mecha- 
nism is used for unaligned collisions. The BusQl signal line 
is used at tiic first bus cycle to indicate the start of a packet 
Referring to FIG. 4, BusCti is now also utilized at prede- 
termined bus cycles for collision detection whidi increases 
the speed at which a collision is defected and responded to. 
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The DusCtl line is monitored by the slave devices as well 
as the master devices to detect coUisions, The BusQl line at 
bus cycles at which a subsequent packet may be initiated are 
normally driven to a low or oflF state. In the present 
embodiment, packets are initiated on even clock cycles; 5 
therefore the BusCtl line during clock cycles 2 and 4 arc 
normally driven to a low or off state. When a collision 
occurs, the BusCtl line at ont or both cycles wiD be driven 
to an on or one state due to the overle^ of the data, 
specifically the start packet signal of a subsequent packet 
Both master, devices and slave devices monitor BusCtl for ^° 
information such as the start of the packet Upon detecting 
an on or one state at cycles 2 and/OT 4, the slave devices 
immediately know that a collision has occurred and elimi- 
nate the packet being received. Thus there is no requirement 
for the master device to notify the slave device, no delay in 
responding to a collision and no possibility that the trans- 
mission of the packet is completed before the slave device 
is notified of the collision. 

The master devices also monitor the BusCtl signal line for 
the occurrence of a packet collision. Upon detection of an on 20 
state at cycle 2 and/or 4. the transmitting master devices will 
arbitrate access to die bus and retransmit the packets to 
ensure accurate transmission of the packets. Thus, the tech- 
nique described enables the slave devices to immediately 
detect the occurrence of a collision and discard the packets 25 
before the slave devices respond to the requests. 

The encoding and decoding of the number of bytes or 
"count" for a memcffy operation also plays a significant role 
in decreasing the latency of processing the transaction. In the 
high speed bus which utilizes the packet fonnat of the 30 
present invention, a balance is achieved between the number 
of bits required to encode the byte count for the memoiy 
transaction and the conq>lcxity of logic at the receiver 
interface of the memory device and the speed of operating 
die same. Referring to FIG. 3. a total of eight bits are used, 35 
Count[7:0]. Although the bits could have been transmitted 
during the same cycle across parallel transmission lines, the 
bits have been deliberately organized across two sequential 
bus cycles and transmitted across adjacent transmission 
lines. By placing the information on adjacent transmission 40 
lines in sequential bus cycles, the amount of wiring required 
to move the data received in the receiver from the bus input 
to the receiver logic which determines tiie count is decreased 
as there is simply shorter distances between the data inputs. 
This is illustrated by the block diagrams set forth in FIGS. 45 
5a and Sb. 

FIG. 5a is a simplified representation of a physical 
i^^)lementation of a slave device bus interface. In this 
iDustration, count bits [7:2] arc transmitted across the bus 
during one clock cycle on parallel bus lines. The bits are 50 
received at the inputs of the bus interface 100, 105, 110, 115, 
120 and 125. Once these bits arc received, the bits are 
processed through logic components (not shown) which 
provide a counter function wiiich counts the number of 
quadbytcs to be transferred. The implementation of this 55 
counter requires a carry chain to be built The decrease in 
length of the carry chain by placing the count bits on 
adjacent transmission lines in sequential bus cycles is con- 
ceptually illustrated by FIGS. Sa and Sb, The length of the 
wire 130 needed to form the carry chain for the single dock 60 
cycle transmission as shown in FIG. 5^ is much greater than 
the length of wire 135 used to form the carry chain for the 
bits transmitted sequentially and in parallel as shown in FIG. 
5^. The decrease in wire length minimizes the anK)unt of die 
area required at the receiver and further affects the speed of 63 
data along critical paths and thus the latency for decoding 
the count information. 
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To simplify the implementation of the receivers of the 
memoiy devices as well as reduce the die size of the receiver 
and decrease the latency for processing bus transactions, 
data is accessed in the memory in groups of four bytes, 

5 referred to herein as "quadbytes**. Although the discussion 
below is directed to the transmission of data in quadbytes, it 
will be obvious to one skilled in the art from reading the 
following discussion that the concepts can be extended to 
any multiple byte organization. 

10 The count bits not only identify the number of bytes to be 
transmitted starting at the Identified memory address, but 
also the location of the bytes in the quadbyte transmitted. 
For example, the memory address of the request identifies a 
location within a quadbyte. To eliminate those bytes not 

15 requested, the memoiy device will mask out the unwanted 
bytes. The mask is also determined from the count value. In 
the preferred embodiment, the memory device masks out 
unwanted bytes during write transactions. During read trans- 
actions all bytes of the quadbyte are transferred across the 

20 bus. The processor then eliminates those bytes of the first 
and last quadbyte received which were not requested. This 
is preferred because this simplifies the implementation of the 
data path inside the memc^ devices. For example, in the 
preferred embodiment, this eliminates the need for a space 

^ consuming and time consuming data alignment network to 
insure proper sequencing of individual bytes. The additional 
logic that would be required to support the masking and 
other functions, sud) as the data alignment network, at the 
memory devices contributes to increasing the complexity of 

^ the chip as well as increasing the die size. However, it should 
be realized that the memory device can be configured to 
perfonn masking operations on both read and write trans- 
actions in order to eliminate any unwanted bytes of quad- 
bytes prior to transmission across the bus. 

A jH-ocessor wishing to formulate a memory request will 
have an internal byte address. Master Address[35:0] and an 
internal byte length, MasterCount[7:0] for the data to be 
transferred piu^uant to the request. Using offset-by-one 
encoding, the convention used is as follows: MasterCount 

^ [7:0]=0(XX)0000 indicates one byte and MastcxCount[7:01= 
11111111 indicates 256 bytes. The processor converts these 
internal values into the vadues for the request packet accord- 
ing to the following: 

45 Address[3S:0]=M2SterAddress{35:0] 

Ovexdow, Count{7:D}=MasterAddre8s[l:0>i-MasteiCoimt[7:0] 

The result of adding MasterAddress(l:0] to MasterCount 
[7:0] serves several purposes. First, the overflow field indi- 
cates to the requesting processor device that althougji the 
size of its request is less than the maximum number of bytes 
allowed in a transaction, the quadbyte granularity .does not 
allow this to occur and the request should be separated into 
two separate transactions. Second, the sum produces a count 
of the number of quadbytes to be transmitted in Count[7:2]« 
which is the graniilarity of the basic data transport units of 
the bus. Third, it provides an index in Count [1:0] to the last 
byte to be transpoted during the last quadbyte of the data 
packet 

Because the processor supplies the index of the last byte 
to be transported, the memory device does not need to 
pcrfcxm any index arithmetic but instead need only perform 
a taUc lookup of the mask data plus a 5in|)le logic operation. 
This reduces the critical path by eliminating the carry chain 
of the addition. Although the operation is performed by the 
requesting processor, the processor, unlike the memory 
device, can typically overlap the addition with other opera- 
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tions such that the effect is minimized. A ^8"!*"?!.^™?^: 
mcntation advanUgc is achieved which sunplifies the 
receiver of the memory devices by performing the addition 
at the processor. TVpicaUy there are more memory devices 
than processor devices. It is therefore advantageous to 
decreL the die size and logic complexity m each of be 
memory devices in exchange for modestly incr«asing the 
complexity of the processor devices to perform this func- 

^°nXiis Addressl 1:01 and CountI 1:0) are used to generate „ 
the masks for the first and last quadbytes of &e memory 
reauest The masks are used to determine which bytes within 
a quadbyte are to be read or written. Masks of varying values 
are generated only for the first and last quadbytw because aU 
the bytes of the intervening quadbyles wiU part of die 
transaction and the masks therefore have a value of 1111. 
HGS. 6« and to are tables which respectiydy lUustra^f 
Kp tables for the mapping of Address[lK)l to Maskl3:0] 
to generate the mask for the first quadbyte. and ^f^^l' 
of Count[l:01 to Mask[7:4] to generate the 
quadbyte. A value of one in the mask mdicates «hat the byte z 
is one of the bytes of the memory ^^l^} 
appUcs to the first quadbyte at MemofylAddress] 3:0118.0]. 
Mask[7:41 appUes to the last quadbyte at Mcmoiyl Addrcss+ 
Countl(3:0H8:0J ((3:01 identifies the byte of the quadbyte 
and [8:0] identifies the bit of the byte). ^,„.x^ ^ 

HGS. ^a-^b iUustrate masks generated for byte transfers 
of various sizes. Referring to FIG. 7«. a single byte transfex 
is described. A single byte transfer is an lUustratoon of a 
S,Sr«« wheie a,e first and last quadbyte is the same 
QuTdbyte. However, the innovative enco^ng cn^loyed ^ 
Sccomnodates single quadbyte transfen ftiough simple 
locic operations whidi result in single and space saving 
oric receiver. If Countl7:21 is 00000. the offse^by- 
one encoding indicates that the transfer is » s^n^e quajbytc. 
Xn countl7a| equals 00000. Maskl7:41 ""d MasklO^ 
fields are logically ANDED together to generate the byte 
mask for the quadbyte. . j f^, 

FIGS yb-ld iUustrate the masks generated for. 
respecUvely. a two byte transfer, a four byte ttansfer and an 
d£byteiansfei.Tl.e masks are generated by simple logic 
bit manipulations which permits simple and fast unplemen- 40 
tadon at the receiver. The airangcment of the bits "i the 
packet are specific to this implementation and »«»ds itself to 
rspaceeffidentimplemcntationof&elogiconthech^^^ 
dau sizes correspond to a Countl7:01 value of 0)000001. 
00000011 and OOOOOIU. For each data size, the four com- « 
binations of MasterAddressll.Ol (whidi is equivalent to toe 
value of Address; 1:0]) will be shown, in order /^"fs 00. 
01 10 U.The use <rf this encoding and placement of thefts 
in the packet permit a reasonable compromise between the 
logic complexity in the processor and the complexity m the X 

""ISSl "by placing count bits 6. 4. 2 at bus qrclc 4 of 
the packet and count bits 7, 5. 3 at bus cycle 5 of «he packtt 
and mpectively on the same signal lines as count bi^ 6. 4 
Z the aiMunt of wiling needed to interconnect the bits witti S5 
the logic which processes the count hits is deocased This 

saving is reflected in the decrease of the die size. In 
parti^ar, a carry fiinction is utilized to pro«ss Ae oou^^^ 
ttts This is simply and cfBdcnUy implemented as bits 2 and 
3 4 and 5, 6 and 7. are aUgned. eliminating the need to wre 60 
for the cairy operation between the bits 2 and 3. 4 and 5. 6 



While the invention has been described m conjunction 
with the prcfeired embodiment, it is evident that numoous 
alternatives, modifications, variations and uses will be 65 
^J^t to those skilled in the art in light of the foregoing 

descripdon. 



